Natural Language Inference is an important task for Natural LanguageUnderstanding. It is concerned with classifying the logical relation betweentwo sentences. In this paper, we propose several text generative neuralnetworks for generating text hypothesis, which allows construction of newNatural Language Inference datasets. To evaluate the models, we propose a newmetric -- the accuracy of the classifier trained on the generated dataset. Theaccuracy obtained by our best generative model is only 2.7% lower than theaccuracy of the classifier trained on the original, human crafted dataset.Furthermore, the best generated dataset combined with the original datasetachieves the highest accuracy. The best model learns a mapping embedding foreach training example. By comparing various metrics we show that datasets thatobtain higher ROUGE or METEOR scores do not necessarily yield higherclassification accuracies. We also provide analysis of what are thecharacteristics of a good dataset including the distinguishability of thegenerated datasets from the original one.
展开▼